Generation of Fundamental Frequency Contours of Mandarin in HMM-based Speech Synthesis using Generation Process Model

نویسندگان

  • Miaomiao Wang
  • Keikichi Hirose
  • Nobuaki Minematsu
چکیده

The HMM-based speech synthesis system can produce high quality synthetic speech with flexible modeling of spectral and prosodic parameters. In this approach, short term spectra, fundamental frequency (F0) and duration are generated by multi-stream HMMs separately. However the quality of synthetic speech degrades when feature vectors used in training are noisy. Among all noisy features, pitch tracking errors and corresponding flawed voiced/unvoiced (VU) decisions are the two key factors in voice quality problems. Pitch tracking errors occur more often in Mandarin vowels of Tone 3 and Tone 4, because the pitch of these vowels can be very low and sometimes treated as aperiodic signal. On the other hand, F0 values in unvoiced regions, such as consonants, are normally defined as unavailable; it is then impossible to use standard HMMs for F0 modeling. Currently a preferred method to solve this is to use a multi-space distribution HMM (MSDHMM). In this approach, discrete distributions are used for modeling the VU decision and continuous Gaussian distributions are used for F0 modeling within the voiced regions. Due to this assumption of undefined F0 values in unvoiced regions and the special structure of MSDHMM, the generated F0 values are limited in accuracy. In this paper, an F0 generation process model is used to estimate F0 values in the region of pitch tracking errors, as well as in unvoiced regions. A prior knowledge of VU is imposed in each Mandarin phoneme and then used for VU decision. Thus the F0 can be modeled within the standard HMM framework.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using FO Contour Generation Process Model for Improved and Flexible Control of Prosodie Features in HMM-based Speech Synthesis

Generation process model of fundamental frequency contours known as Fujisaki's model is ideal to represent global features of prosody. It is a command response model, where the commands have clear relations with linguistic and para/non linguistic information included in the utterance. Therefore, by controlling fundamental frequency contours in the framework of the generation process model, a mo...

متن کامل

Improved Automatic Extraction of Generation Process Model Commands and Its use for Generating Fundamental Frequency Contours for Training HMM-based Speech Synthesis

Generation process model of fundamental frequency (F0) contours can well represent F0 movements of speech keeping a clear relation with linguistic information of utterances. Therefore, by using the model, improvement of HMM-based speech synthesis is expected. One of major problems preventing the use of the model is that the performance of automatic extraction of the model parameters from observ...

متن کامل

Fundamental Frequency Contour Reshaping in HMM-based Speech Synthesis and Realization of Prosodic Focus Using Generation Process Model

Frame-by-frame representation is not appropriate for prosodic features, which are tightly related to speech units spreading a wide time span, such as words, phrases and so on. This causes an inherit problem in fundamental frequency (F0) contour generation by HMM-based speech synthesis. Our formerlydeveloped method, which modify generated F0 contours in the framework of the generation process mo...

متن کامل

Superpositional Modeling of Fundamental Frequency Contours for HMM-based Speech Synthesis

Statistical parametric speech synthesis technologies, such as HMM-based and DNN-based ones, gain special attention from researchers because of their ability in generating speech in various voice qualities and styles. In these methods, all acoustic parameters (except durational ones) are handled in a frame-by-frame manner, which is not appropriate for prosodic features. Although relation of adja...

متن کامل

Generation of fundamental frequency contours for Mandarin speech synthesis based on tone nucleus model

A new method for generating sentence F0 contours of Mandarin speech is proposed. The method assumes the F0 contour generation process model, but generates the tone and phrase components in different ways and sums them to produce a sentence F0 contour. The tone component is generated concatenating F0 patterns of tone nuclei, which are predicted by a corpus-based scheme (binary decision trees). E...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009